Tutorial contnet:

Motivation:

This dataset is an list of video game sales collated from 1980 till 2020 and contains the data of the game release year,rank, sales by region, platform of release among other data.
During this tutorial we will explore the dataset and make observations to answer following critical questions:

(1). What are the best 10 publishers ranked by their overall sales in North America?
(2). Is the highest ranked publisher in the previous question still the best publishers in the recent years?
(3). Are the rank in question 1 stays stable till the recent years? What are the changes, if any?
(4). What is the overall most welcomed video game genre in North America?

After exploring the dataset we would try to build regression model and predict the Sales of video games in North America.

Import the packages and libraries:

All libraries we used below are imported here.

library(tidyverse)
library(plotly)
library(sqldf)
suppressWarnings(library(caret))
suppressWarnings(library(glmnet))

Import Data:

The data is imported from kaggle, you can find more information about the dtataset from this link: https://www.kaggle.com/gregorut/videogamesales/data#
The dataset has been tidied:
(1) removed entities with N/A’s in Year attribute,
(2) since the data ay year 2017 and 2020 are not sufficient, we remove the entities with such years from the dataset.
There are total of 16327 rows, 11 columns in this dataset (entities with years of N/A’s, 2017’s, and 2020’s are removed).

videogamesales=read.csv("vgsales.csv",header=TRUE)
videogamesales <- videogamesales %>%  filter(Year!="N/A" & Year!="2017" & Year!="2020" )
head(videogamesales)
##   Rank                     Name Platform Year        Genre Publisher NA_Sales
## 1    1               Wii Sports      Wii 2006       Sports  Nintendo    41.49
## 2    2        Super Mario Bros.      NES 1985     Platform  Nintendo    29.08
## 3    3           Mario Kart Wii      Wii 2008       Racing  Nintendo    15.85
## 4    4        Wii Sports Resort      Wii 2009       Sports  Nintendo    15.75
## 5    5 Pokemon Red/Pokemon Blue       GB 1996 Role-Playing  Nintendo    11.27
## 6    6                   Tetris       GB 1989       Puzzle  Nintendo    23.20
##   EU_Sales JP_Sales Other_Sales Global_Sales
## 1    29.02     3.77        8.46        82.74
## 2     3.58     6.81        0.77        40.24
## 3    12.88     3.79        3.31        35.82
## 4    11.01     3.28        2.96        33.00
## 5     8.89    10.22        1.00        31.37
## 6     2.26     4.22        0.58        30.26

Explore dataset:

Here we will freely discove the trends and distributions in the datasets, and try to answer the critical questions below:

(1). What are the best 10 publishers ranked by their overall sales in North America?
(2). Is the highest ranked publisher in the previous question still the best publishers in the recent years?
(3). Are the rank in question 1 stays stable till the recent years? What are the changes, if any?
(4). What is the overall most welcomed video game genre in North America?

  • Explore Video game releases by year:

    First we start by exploring the number of releases through the years, and generate following bar graph.
# Function to format plot into a more readable display
formatPlot <- function() {
 return (theme(axis.text.x = element_text(angle = 90, size = 10, vjust = 0.4), plot.title = element_text(size = 15, vjust = 2),axis.title.x = element_text(size = 12, vjust = -0.35)))
}
ggplot(videogamesales, aes(Year)) + 
  geom_bar(fill = "purple") +
  formatPlot() +
  ggtitle("Video Game Releases by Year")

From the plot above we observe that the video game sales was limited by technologies before 1995. There is huge spike in the number of releases after 2000 and it peaked during 2008 and 2009. It has started declining last few years.

  • Explore Sales by Year and regions:

    We generated a multiple line chart to findout trends of sales of different regions.
# in order to draw a line chart, convert attribute year into numeric values
videogamesales$Year <- as.numeric(as.character(videogamesales$Year))
All<- sqldf("SELECT Year, sum(NA_Sales) as AME, sum(EU_Sales) as EU,sum(JP_Sales) as JP, sum(Other_Sales) as Other, sum(Global_Sales) as Global from videogamesales group by Year order by Year")

# multiple line chart
plot_ly()%>%
  add_trace(y=All$Global,x=All$Year,name='Global Sale',mode = 'lines')%>%
  add_trace(y=All$AME,x=All$Year,name='American Sale',mode = 'lines')%>%
  add_trace(y=All$EU,x=All$Year,name='Europe Sale',mode = 'lines') %>%
  add_trace(y=All$JP,x=All$Year,name='Japen Sales',mode = 'lines')%>%
  add_trace(y=All$Other,x=All$Year,name='Other Sales',mode = 'lines')%>%
  layout(title = "Sales by Year in All Platforms",
         scene = list( xaxis = list(title = "Year"), yaxis = list(title = "Sales (in Millions)")))

As observed from the multiple line chart, the the sales in all regions share the similar trend: The growing rates of sales spikes after 1995, and peaked at around 2007.Later the sales in all regions has decreased after 2007.

  • Explore Major Platforms by sales in North America:

    We first aggregate the overall sales in North America, grouped by distinct platforms, then we use the aggregated data to generate a horzontal bar chart of the top 10 platforms.
# aggregate data
revenue_by_platform <- videogamesales %>% 
                    group_by(Platform) %>%
                    summarize(Sum_NA_Sales = sum(NA_Sales)) %>%
                    arrange(desc(Sum_NA_Sales)) %>% 
                    slice(1:10)

# horizontal bar graph
ggplot(revenue_by_platform, aes(x = reorder(Platform, -Sum_NA_Sales) , y = Sum_NA_Sales)) + 
  geom_bar(fill = "maroon", stat = "identity") +
  formatPlot()+
  ggtitle("Sum_NA_Sales by Platform")+
  xlab("Top 10 Platform")+
  ylab("Total Sales in North America (in Millions)")

From this plot we ranked the top 10 platform by their totoal sales in North America. But we could use more information to find the popular platforms in the recent years, which lead us to the following exploration.

  • Explore Platforms by North America Sales over the years:

    Extract the highest ranked platform for each year by aggregating overall sales in NorthAmerica, when grouped by year and platform. And then plot a bar graph of its Sales vs year with color legend of platforms.
# aggregating data, extracting top 1 platform for each year
top_platforms <- videogamesales %>%
             group_by(Year, Platform) %>%
             summarize(Sum_NA_Sales = sum(NA_Sales)) %>%
             arrange(desc(Sum_NA_Sales)) %>%
             top_n(1)
top_platforms
## # A tibble: 37 x 3
## # Groups:   Year [37]
##     Year Platform Sum_NA_Sales
##    <dbl> <fct>           <dbl>
##  1  2009 Wii             117. 
##  2  2010 X360            107. 
##  3  2008 Wii              98.8
##  4  2004 PS2              96.8
##  5  2002 PS2              96.5
##  6  2011 X360             86.7
##  7  2005 PS2              85.6
##  8  2007 Wii              84.2
##  9  2003 PS2              83.5
## 10  1998 PS               83.2
## # ... with 27 more rows
# bar plot with color legend representing different platforms.
ggplot(top_platforms, aes(x = Year, y = Sum_NA_Sales, fill = Platform)) + 
  geom_bar(stat = "identity") +
  formatPlot() +
  ggtitle("Top Platform by Total North America Sales Each Year") +
  ylab("Total Sales in North America (in Millions)")

By obserbing the bar graph above, we discovered that once a new popular platform is introduced to the market, it stays as the top platform a few years. Also, playstation platform is the most famous in the market for about 20 years. PS4 is the most popular platform in the recent years.

  • Explore Publishers by North America Sales:

    Aggregate data, ranking publishers by overall North America Sales, extract top 10 publishers and plot a bar chart of the top 10 publishers.
totoalsale <- sum(videogamesales$NA_Sales)

BestSellingPublisher <- videogamesales %>% 
                    group_by(Publisher) %>%
                    summarize(Sum_NA_Sales = sum(NA_Sales)) %>%
                    arrange(desc(Sum_NA_Sales))

revenue_by_publisher <- BestSellingPublisher %>% 
                    slice(1:10) %>%
                    mutate(percentage=100*Sum_NA_Sales/totoalsale)

revenue_by_publisher
## # A tibble: 10 x 3
##    Publisher                   Sum_NA_Sales percentage
##    <fct>                              <dbl>      <dbl>
##  1 Nintendo                            816.      18.8 
##  2 Electronic Arts                     584.      13.5 
##  3 Activision                          426.       9.83
##  4 Sony Computer Entertainment         265.       6.12
##  5 Ubisoft                             253.       5.83
##  6 Take-Two Interactive                220.       5.09
##  7 THQ                                 209.       4.81
##  8 Microsoft Game Studios              155.       3.59
##  9 Sega                                109.       2.51
## 10 Atari                               101.       2.34
ggplot(revenue_by_publisher, aes(x = reorder(Publisher, Sum_NA_Sales) , y = Sum_NA_Sales)) + 
  geom_bar(fill = "deepskyblue", stat = "identity") +
  formatPlot()+
  ggtitle("Sum_NA_Sales by Publishers")+
  xlab("Top 10 Publisher")+
  ylab("Total Sales in North America (in Millions)") +
  coord_flip()

From the bar chart and table above, we discovered the top 10 publishers, ranking by the overall sales in North America. Nintendo is the top most publisher who takes up about 20% of the overall NA_Sales. Another discocovery is that the top 10 publichsers takes about 70% of the video games market of North America. Again, we can explore more about the popular publicshers in the recent years.

  • Explore Top Publishers by North America Sales Over the Years:

    Extract the highest ranked publisher for each year by aggregating overall sales in North America, when grouped by year and publisher. And then plot a bar graph of its Sales vs year with color legend of publisher.
top_publishers <- videogamesales %>%
             group_by(Year, Publisher) %>%
             summarize(Sum_NA_Sales = sum(NA_Sales)) %>%
             arrange(desc(Sum_NA_Sales)) %>%
             top_n(1)
top_publishers
## # A tibble: 37 x 3
## # Groups:   Year [37]
##     Year Publisher       Sum_NA_Sales
##    <dbl> <fct>                  <dbl>
##  1  2006 Nintendo                90.4
##  2  2009 Nintendo                53.2
##  3  2005 Electronic Arts         46.8
##  4  2008 Electronic Arts         45.1
##  5  2007 Nintendo                42.8
##  6  2004 Electronic Arts         41.0
##  7  2002 Electronic Arts         40.7
##  8  2003 Electronic Arts         40.5
##  9  1989 Nintendo                39.8
## 10  2010 Activision              39.5
## # ... with 27 more rows
ggplot(top_publishers, aes(x = Year, y = Sum_NA_Sales, fill = Publisher)) + 
  geom_bar(stat = "identity") +
  formatPlot() +
  ggtitle("Top Publishers by Total North America Sales Each Year") +
  ylab("Total Sales in North America (in Millions)")

From this bar graph we found that Nintendo almost stayed as top publisher from 1984 to 1999. After 1999, Electronic Arts stayed as the top publisher for most of years, but Nintendo still had significant hight sales in 2006, 2007 and 2009. It worth a notice that in the recent years, Acitivision appears as top publisher,this publisher can be potentially our choice in the futrue. To explore the NA_Sales’ growth rate over the years, we plan to draw a line chart as below.

  • Explore Specific Publishers’ growth rate by North America Sales Over the Years:

    Aggreaget and generate a dataframe with the sales of publishers for each year, the plot a multiple line chart.
# extracting aggregated columns 
Nin_NA <- videogamesales %>%
             filter(Publisher=="Nintendo") %>% 
             group_by(Year, Publisher) %>%
             summarize(Nin_NA = sum(NA_Sales)) %>%
             select(Nin_NA,Year)
EA_NA <- videogamesales %>%
             filter(Publisher=="Electronic Arts") %>% 
             group_by(Year, Publisher) %>%
             summarize(EA_NA = sum(NA_Sales)) %>%
             select(EA_NA,Year)
Act_NA <- videogamesales %>%
             filter(Publisher=="Activision") %>% 
             group_by(Year, Publisher) %>%
             summarize(Act_NA = sum(NA_Sales)) %>%
             select(Act_NA,Year)
Sony_NA <- videogamesales %>%
             filter(Publisher=="Sony Computer Entertainment") %>% 
             group_by(Year, Publisher) %>%
             summarize(Sony_NA = sum(NA_Sales)) %>%
             select(Sony_NA,Year)
Ubi_NA <- videogamesales %>%
             filter(Publisher=="Ubisoft") %>% 
             group_by(Year, Publisher) %>%
             summarize(Ubi_NA = sum(NA_Sales)) %>%
             select(Ubi_NA,Year)
Take_NA <- videogamesales %>%
             filter(Publisher=="Take-Two Interactive") %>% 
             group_by(Year, Publisher) %>%
             summarize(Take_NA = sum(NA_Sales)) %>%
             select(Take_NA,Year)

# merge aggregated columns to a new dataframe
platformsalesbyyear<-merge(Nin_NA,EA_NA,by.x = "Year") %>% 
                     merge(Act_NA, by.x = "Year") %>% 
                     merge(Sony_NA, by.x = "Year") %>% 
                     merge(Ubi_NA, by.x = "Year") %>%
                     merge(Take_NA, by.x = "Year")

# multiple line chart 
plot_ly()%>%
  add_trace(y=platformsalesbyyear$Nin_NA,x=platformsalesbyyear$Year,name='Nintendo',mode = 'lines')%>%
  add_trace(y=platformsalesbyyear$EA_NA,x=platformsalesbyyear$Year,name='Electronic Arts',mode = 'lines')%>%
  add_trace(y=platformsalesbyyear$Act_NA,x=platformsalesbyyear$Year,name='Activision',mode = 'lines') %>%
  add_trace(y=platformsalesbyyear$Sony_NA,x=platformsalesbyyear$Year,name='Sony Computer Entertainment',mode = 'lines')%>%
  add_trace(y=platformsalesbyyear$Ubi_NA,x=platformsalesbyyear$Year,name='Ubisoft',mode = 'lines')%>%
  add_trace(y=platformsalesbyyear$Take_NA,x=platformsalesbyyear$Year,name='Take-Two Interactive',mode = 'lines')%>%
  layout(title = "Trends of Sales in North America over the Years, for specific platforms",
         scene = list( xaxis = list(title = "Year"), yaxis = list(title = "NA_Sales (in Millions)")))

By observing the linechart above, we discovered: (1) Though Nintendo owns 20% of market in overall NA_sales market, in the past few years its sales has decreased, and it stays around 3rd and 4th place in the North America market since 2010. (2) Electronic Arts and Activision have higher sales ranking in after 2010, both of these publisher are becoming the top publishers from 2010 to 2016.

  • Explore Top Generes by Average Sales in North America:

    Extract the highest ranked genre for each year by aggregating overall sales in North America, when grouped by year and genre. And then plot a bar graph of its Sales vs year with color legend of genres.
top_genre <- videogamesales %>% 
         group_by(Year, Genre) %>% 
         summarize(Count = n(), Sum_NA_Sales = sum(NA_Sales), AvgSales=Sum_NA_Sales/Count) %>%
         top_n(1)


ggplot(top_genre, aes(Year,AvgSales, fill = Genre)) + 
  geom_bar(stat = "identity") +
  ggtitle("Top Genre by Average NA Sales each Year") +
  ylab("Average NA Sales (in Millions")+
  formatPlot() 

By observing the bar graph, we discover that: (1) Shooter game has higher average sales in north america in past few years. (2) The average sales of top Ganre was suprisingly high from 1980 to 1992, the reason can be that there were less games and less genre was produced, therefore poppular ones have significant high average sales.

  • Answers found during the exploration:


    (1). What are the best 10 publishers ranked by their overall sales in North America?
    Nintendo, Electronic Arts, Activision, Sony Computer Entertainment, Ubisoft, Take-Two Interactive, THQ, Microsift Game Studios, Sega, and Atari.

    (2). Is the highest ranked publisher in the previous question still the best publishers in the recent years?
    Not really, later Electronic Arts and Activision became the top publishers in the recent years.

    (3). Are the ranking (top 5) in question 1 stays stable till the recent years? What are the changes, if any?
    The ranking of Nintendo has lowered, rankings ofElectronic Arts and Activision has rised.

    (4). What is the overall most welcomed video game genre in North America?
    Shooter is the overall most popular genere in North America. It is still the most popular one in the recent years.

Making Prediction

Since our goal was to predict Sales in North America (NA_Sales), we decide to do regression and prodict this variable, using elastic net regularization for our linear regression model.

# group the platforms into producers of the platforms: (Mainly Nintendo, Sony, Microsoft, Sega and the others).
videogamesales$Platform <- as.character(videogamesales$Platform)

videogamesales$Platform[videogamesales$Platform %in%
                          c("Wii","NES","GB","DS","SNES","GBA","3DS","N64","GC","WiiU")] <- "Nintendo"
videogamesales$Platform[videogamesales$Platform %in% 
                          c("PS","PS2","PS3","PS4","PSP","PSV")] <- "Sony"
videogamesales$Platform[videogamesales$Platform %in% 
                          c("XB","XOne","X360")] <- "Microsoft"
videogamesales$Platform[videogamesales$Platform %in% 
                          c("GG","DC","SAT","GEN")] <- "Sega"
videogamesales$Platform[!(videogamesales$Platform  %in% 
                            c("Nintendo","Sony","Microsoft","Sega"))] <- "Other"
videogamesales$Platform <- as.factor(videogamesales$Platform)

# To reduce the publisher factors by their overall sales, 
BestSellingPublisher$Publisher <- as.character(BestSellingPublisher$Publisher)
#  $$$ for the publishers who has overall NA Sales over 50 million, 
#  $$  for the publishers who has overall NA Sales lower than 50 million and over 10 miliion,
#  $   for the publishers who has overall NA Sales lower than 10 million and over 1 million,
#  X   for the publishers who has overall NA Sales lower than 1 million

videogamesales$Publisher <- as.character(videogamesales$Publisher)
videogamesales$Publisher[videogamesales$Publisher %in%
                 BestSellingPublisher$Publisher[BestSellingPublisher$Sum_NA_Sales >= 50.00]] <- "$$$"
videogamesales$Publisher[videogamesales$Publisher %in%
                 BestSellingPublisher$Publisher[BestSellingPublisher$Sum_NA_Sales >= 10.00 & BestSellingPublisher$Sum_NA_Sales < 50.00]] <- "$$"
videogamesales$Publisher[videogamesales$Publisher %in%
                 BestSellingPublisher$Publisher[BestSellingPublisher$Sum_NA_Sales >= 1.00 & BestSellingPublisher$Sum_NA_Sales < 10.00]] <- "$"
videogamesales$Publisher[videogamesales$Publisher %in%
                 BestSellingPublisher$Publisher[BestSellingPublisher$Sum_NA_Sales < 1.00]] <- "X"

videogamesales$Publisher <- as.factor(videogamesales$Publisher)

# overview of current dataset
str(videogamesales)
## 'data.frame':    16323 obs. of  11 variables:
##  $ Rank        : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Name        : Factor w/ 11493 levels "'98 Koshien",..: 10991 9343 5532 10993 7370 9707 6648 10989 6651 2594 ...
##  $ Platform    : Factor w/ 5 levels "Microsoft","Nintendo",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Year        : num  2006 1985 2008 2009 1996 ...
##  $ Genre       : Factor w/ 12 levels "Action","Adventure",..: 11 5 7 11 8 6 5 4 5 9 ...
##  $ Publisher   : Factor w/ 4 levels "$","$$","$$$",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ NA_Sales    : num  41.5 29.1 15.8 15.8 11.3 ...
##  $ EU_Sales    : num  29.02 3.58 12.88 11.01 8.89 ...
##  $ JP_Sales    : num  3.77 6.81 3.79 3.28 10.22 ...
##  $ Other_Sales : num  8.46 0.77 3.31 2.96 1 0.58 2.9 2.85 2.26 0.47 ...
##  $ Global_Sales: num  82.7 40.2 35.8 33 31.4 ...
set.seed(17)

videogamesales <- videogamesales[sample(nrow(videogamesales)),]

# split
training <- videogamesales[1:8162,]
test <- videogamesales[8163:16323,]

validate <- test 
training<- unique(training)
test <- unique(test)
test$Global_Sales <- NA
test <- test[!(names(test) %in% c("Name"))]

# overview of training dataframe
str(training)
## 'data.frame':    8162 obs. of  11 variables:
##  $ Rank        : int  10325 14546 14539 1778 10514 2579 13777 12460 1986 12919 ...
##  $ Name        : Factor w/ 11493 levels "'98 Koshien",..: 1164 8089 9209 3120 3681 6472 9939 1205 6285 11493 ...
##  $ Platform    : Factor w/ 5 levels "Microsoft","Nintendo",..: 5 3 2 5 5 5 5 1 2 2 ...
##  $ Year        : num  2011 2009 2004 2011 1995 ...
##  $ Genre       : Factor w/ 12 levels "Action","Adventure",..: 4 8 11 3 3 11 8 2 2 1 ...
##  $ Publisher   : Factor w/ 4 levels "$","$$","$$$",..: 3 3 1 3 1 3 4 3 3 3 ...
##  $ NA_Sales    : num  0.06 0 0.02 0.45 0 0.44 0 0.05 0.62 0 ...
##  $ EU_Sales    : num  0.03 0.02 0.01 0.51 0 0.19 0 0.01 0.32 0 ...
##  $ JP_Sales    : num  0 0 0 0 0.1 0.03 0.04 0 0 0.05 ...
##  $ Other_Sales : num  0.02 0.01 0 0.19 0.01 0.13 0 0 0.1 0 ...
##  $ Global_Sales: num  0.11 0.03 0.03 1.15 0.1 0.8 0.04 0.06 1.05 0.05 ...
# over view of test dataframe
str(test)
## 'data.frame':    8161 obs. of  10 variables:
##  $ Rank        : int  11763 3504 4237 2348 8977 8317 2355 950 12677 14512 ...
##  $ Platform    : Factor w/ 5 levels "Microsoft","Nintendo",..: 2 5 2 2 5 1 2 2 2 5 ...
##  $ Year        : num  2009 2009 2007 2006 2011 ...
##  $ Genre       : Factor w/ 12 levels "Action","Adventure",..: 2 11 1 7 8 8 8 8 11 2 ...
##  $ Publisher   : Factor w/ 4 levels "$","$$","$$$",..: 4 3 2 3 1 2 2 2 3 4 ...
##  $ NA_Sales    : num  0.07 0.28 0.4 0.81 0 0.03 0 0.25 0.05 0 ...
##  $ EU_Sales    : num  0 0.22 0.03 0 0 0.12 0 0.07 0 0 ...
##  $ JP_Sales    : num  0 0 0 0.01 0.15 0 0.89 1.49 0 0.03 ...
##  $ Other_Sales : num  0 0.07 0.04 0.07 0 0.01 0 0.02 0 0 ...
##  $ Global_Sales: logi  NA NA NA NA NA NA ...
# numeric value defining the amount of shrinkage
lambdagrid <- 10 ^ seq(2,-2,length = 100)

# elastic net mixing parameter
alphagrid <-  seq(0,1, length= 10 )

# train control
trnControl <- trainControl(
                method = "repeatedCV",
                number= 10,
                 repeats = 5)

srchGrd = expand.grid(.alpha = alphagrid, .lambda = lambdagrid)

formula <- NA_Sales ~ Publisher + Genre + Platform 

# model 1 and 2
model <- train(formula, data=training, method = 'glmnet', tuneGrid= srchGrd, trControl = trnControl,
              standardize=TRUE, maxit= 1000000 )
model2 <- lm(formula, data = training)

model$bestTune
##    alpha     lambda
## 19     0 0.05336699
final <- model$finalModel

prediction <- predict(model, test, s= final$lambda.min)
summary(prediction)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.1541  0.1427  0.3111  0.2748  0.3761  0.6506
str(prediction)
##  Named num [1:8161] -0.00149 0.32187 0.19701 0.42873 0.14046 ...
##  - attr(*, "names")= chr [1:8161] "11579" "3454" "4179" "2317" ...
prediction2<- predict(model2,test)

Prediction

Following output are the first 20 predictions (others are too long and has been ommited), from the second output we calculated the error of prediction is around 0.7.

Eval <- data.frame(Game= validate$Name, Actual = validate$NA_Sales)

prediction <- round(prediction,2)

#To be replaced
Eval <- Eval[1:length(prediction),]
Eval$Predicted <- abs(prediction)


Eval$diff <- abs(Eval$Predicted - Eval$Actual)
MSER <- sqrt(mean(Eval$diff^2))


Eval <- data.frame(Game= validate$Name, Actual = validate$NA_Sales)
prediction2 <- round(prediction2,2)

#To be replaced
Eval <- Eval[1:length(prediction2),]
Eval$Predicted <- abs(prediction2)

Eval$diff <- abs(Eval$Predicted - Eval$Actual)
head(Eval,20)
##                                          Game Actual Predicted diff
## 1                                       Again   0.07      0.00 0.07
## 2                            NCAA Football 10   0.28      0.32 0.04
## 3                         Crash of the Titans   0.40      0.19 0.21
## 4                  Monster 4X4: World Circuit   0.81      0.44 0.37
## 5              Black * Rock Shooter: The Game   0.00      0.12 0.12
## 6                   The Rise of the Argonauts   0.03      0.32 0.29
## 7                           Seiken Densetsu 3   0.00      0.27 0.27
## 8                              Secret of Mana   0.25      0.27 0.02
## 9                         MLB Power Pros 2008   0.05      0.39 0.34
## 10                           S.Y.K Renshouden   0.00      0.07 0.07
## 11   Jimmie Johnson's Anything With an Engine   0.07      0.37 0.30
## 12                        Transformers: Prime   0.13      0.37 0.24
## 13          Deathrow: Underground Team Combat   0.05      0.44 0.39
## 14              InuYasha: A Feudal Fairy Tale   0.00      0.36 0.36
## 15 X-Men Origins: Wolverine - Uncaged Edition   0.42      0.42 0.00
## 16           Call of Duty: The War Collection   0.05      0.61 0.56
## 17                         Grand Theft Auto V   7.01      0.30 6.71
## 18                Saints Row: Gat out of Hell   0.00      0.12 0.12
## 19               The Walking Dead: Season One   0.12      0.03 0.09
## 20              Gabrielle's Ghostly Groove 3D   0.08      0.00 0.08
MSER <- sqrt(mean(Eval$diff^2))
MSER
## [1] 0.6908453